Entry Name:  "BFC-Banerjee-MC2"

VAST Challenge 2015
Mini-Challenge 2

 

 

Team Members:


Pranab Banerjee, Boston Fusion Corp., pranab.banerjee@bostonfusion.com

Student Team: NO



Did you use data from both mini-challenges? NO



Analytic Tools Used:


R and Python programming languages



Approximately how many hours were spent working on this submission in total?

22 hours



May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is complete? YES



Video Download

Video:

https://drive.google.com/file/d/0Bx4gFykeYQnHdGczVjYyTEhoR3c/view?usp=sharing



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Questions

MC2.1Identify those IDs that stand out for their large volumes of communication.  For each of these IDs

      a.      Characterize the communication patterns you see.

On all three days, the two IDs 839736 and 1278894 clearly stood out over the others in terms of volumes of texts sent. For example, Figure 1 shows a bar graph of number of texts sent by each user on Saturday. The taller than normal line on the left side corresponds to ID 839736 and the tallest line on the right corresponds to ID 1278894. The total number of user IDs on this day was 5297.


Figure 1

Interesting characteristics of these two IDs:

  1. Both IDs send text to a large number of users at the same time instant. So they have the ability to broadcast a text. For example, on Friday, the 1278894 sent 38,658 texts with only 60 unique timestamps.

  2. They never text to any “external” destinations.

  3. The number of unique IDs that they send texts to and receive texts from are identical.

  4. While ID 839736 sends and receives texts throughout the duration that the park is open, ID 1278894 primarily sends and receives in five different time intervals per day. These intervals are the same on each day and start almost exactly at noon (Figure 2 and Figure 3). The temporal distribution of received calls for 839736 is quite different on Sunday from other days (Figure 9).


Figure 2


Figure 3

      b.      Based on these patterns, what do you hypothesize about these IDs?

These two IDs send all texts from “Entry Corridor” area and never move. Based on this, and their ability to broadcast a text to large number of visitors, they are hypothesized to be part of park management. Based on the patterns in Figures 2 and 3, 1278894 is probably running raffle/promotion during those intervals. Visitors text in, and winners are announced by 1278894 every minute in the interval. ID 839736 receives text throughout the day and is likely a security officer.



MC2.2Describe up to 10 communications patterns in the data. Characterize who is communicating, with whom, when and where. If you have more than 10 patterns to report, please prioritize those patterns that are most likely to relate to the crime.

Some of the interesting patterns observed in the communication data are described below:

2.2.1 Pattern of communication from Coaster Alley and Wet Land over time

Bar plots of total number of texts sent from each of the five areas (Entry Corridor, Kiddie Land, Tundra Land, Wet Land, and Coaster Alley) at one minute resolution were computed for all three days. The plots for Coaster Alley in particular, as well as Wet Land showed interesting patterns. For both of these areas, the patterns for Sunday were very different from the patterns on Friday and Saturday.

Figure 4 and Figure 5 show these bar plots for Coaster Alley area for Saturday and Sunday respectively. The plot for Friday is very similar to Figure 4. It is clear that the communication volume surges two times per day at the same time each day, except that the second surge is missing on Sunday. It is hypothesized that these two surges correspond to the time when Scott Jones appear on the pavilion which is in this area. The absence of the second surge on Sunday indicates that Scott did not appear at that time, most likely because the crime happened before this time and the pavilion was closed.


Figure 4


Figure 5

Figures 6 and 7 show the bar plots of communication by the minute from the Wet Land area for Saturday and Sunday. The temporal communication patter for Friday was similar to Figure 6 and not shown here for space limitation. Figure 7 clearly shows a deviation from the normal pattern shown in Figure 6. There is a clear increase in the number of texts sent shortly after 200th minute. It is hypothesized that this is when the crime was discovered, since the entrance to the pavilion is in this area. This surge was likely from communication with park security.


Figure 6


Figure 7

2.2.2 Pattern in texts received by ID 839736 by the minute during park hours

Bar plots were computed for the number of texts received by ID 839736 every minute during the park hours. This ID was chosen based on the hypothesis described in Section MC 2.1. Figures 8 and 9 show the barplots for Saturday and Sunday respectively.




Figure 8


Figure 9

It is clear from Figure 9 that the pattern is very different on Sunday from the normal pattern as shown in Figure 8. There is a significant increase in the number of texts received shortly after the 200th minute. The timing of this increase in incoming texts coincide with the timing of the increase in outgoing texts shown in Figure 7. This bolsters the hypothesis that ID 839736 is part of park security, and that the crime happened at the time corresponding to this surge in incoming texts in Figure 9, which is noon.



2.2.3 Patterns in the size of communicating groups in the park

A graph theoretic approach was taken to analyze group structures based on communication patterns. It was found that on each day, there was a significantly large group where each member had two way communication with every other member in the group. The size of this group was significantly larger than any other group on a given day. For example, the size of this largest group for Friday contained 1559 members, and the size of the next largest group was 43. For Saturday, the largest group had 2520 members whereas the next largest group had 43 members. For Sunday, the largest group had 2590 members while the second largest group had 44 members.

Figure 10 shows the histogram of group size for Sunday, excluding the largest group for cognition friendly scaling of the x-axis. Please note that the two highest communication volume nodes ( 839736 and 1278894) are excluded from the communication graph as those communications do not contribute to inter-group communication patterns. Also, external communications are excluded from the graph. The histogram in Figure 10 is quite typical across the three days. It shows that there are thousands of visitors who do not communicate with anyone in the park – either because they are visiting solo or because they are with their group all the time, thus making it unnecessary to use text messaging. The second largest group size is two, followed closely by groups of size three. There are close to a hundred groups of these size. Groups of higher size are lot less frequent.

It was also found that the IDs:
927280, 1075446, 1297787, 1765818, 1397469, 1975667, 1495961, 1595318, 1248261, 1809285, 620186, 444761, 268856, 487752, 1381640, 872825, 551928, 1412224, 685539, 821075, 107173, 679297, 205691, 135051, 470906, 393107, 1773176, 640904, 1574967, 1061215, 711179, 1971223, 309261, and 1549219
appear on all three days, and they receive a large number of texts (> 100) but sends out a small number of texts. Based on this, it is hypothesized that these IDs correspond to park employees.


Figure 10



2.2.4 Pattern of communication from Kiddie Land

Bar plots were computed for the number of texts sent from Kiddie Land over each minute during park hours. Figure 11 ad Figure 12 show these bar plors for Friday and Saturday respectively. The plot for Sunday was similar to Figure 11 (Friday). It is clear from these two figures that there was an increased volume of text on Saturday close to 550th minute. It is hypothesized that the Kiddie Land held a special event on Saturday at that time, resulting in more visitors in that area.


Figure 11


Figure 12

MC2.3 From this data, can you hypothesize when the crime was discovered?  Describe your rationale.

Following the discussion in Section 2.2.2, it is hypothesized that the crime was discovered at noon on Sunday.